Reinforcement Learning Benchmarks and Bake-offs II

نویسندگان

  • Alain Dutech
  • Tim Edmunds
  • Jelle Kok
  • Michail Lagoudakis
  • Michael Littman
  • Martin Riedmiller
  • Brian Russell
  • Bruno Scherrer
  • Rich Sutton
  • Stephan Timmer
  • Nikos Vlassis
  • Adam White
  • Shimon Whiteson
  • Dinakar Jayarajan
چکیده

Evolution of neural networks, through genetic algorithms or otherwise, hasrecently emerged as a possible way to solve challenging reinforcement learningproblems; NEAT (or NeuroEvolution of Augmenting Topologies), is a particularlypowerful such method. Because it is based on evolving a population of solutions,where individuals are evaluated on a number of test episodes, the current setup ofthe RL Benchmarking Event tasks makes it impossible to compare with it directly.In particular, neuroevolution methods are geared towards finding the best solution(instead of optimizing average reward over the course of learning), and towardsreinforcement over the entire episode (instead of during the episode). NEAT inparticular is best suited for challenging tasks where topology evolution has time tohave an effect, especially those that are non-Markovian and therefore require recur-rency. Nevertheless, it is instructive to compare how NEAT solves these problems,and where the particular strengths and weaknesses are. 1 The NEAT Algorithm The NeuroEvolution of Augmenting Topologies (NEAT) method [6] is a policy-searchreinforcement learning algorithm that uses a genetic algorithm to search for optimalneural network policies. NEAT automatically evolves network topology to fit the com-plexity of the problem by combining the usual search for the appropriate networkweights with complexification of the network structure. By starting with simple net-works and expanding the search space only when beneficial, NEAT is able to findsignificantly more complex controllers than other fixed-topology learning algorithms.This approach is highly effective: NEAT outperforms other neuroevolution (NE) meth-ods on complex control tasks like the double pole balancing task [5, 6] and the roboticstrategy-learning domain [7]. These properties make NEAT an attractive method forevolving neural networks in complex tasks. In this section, the NEAT method is brieflyreviewed; see [5, 6, 7] for more detailed descriptions.NEAT is based on three key ideas. First, evolving network structure requires a flex-ible genetic encoding. Each genome in NEAT includes a list of connection genes, eachof which refers to two node genes being connected. Each connection gene specifiesthe in-node, the out-node, the weight of the connection, whether or not the connec-

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Melioration learning in games with constant and frequency-dependent pay-offs

The paper explores the implications of melioration learning—an empirically significant variant of reinforcement learning—for game theory. We show that in games with invariable pay-offs melioration learning converges to Nash equilibria in a way similar to the replicator dynamics. Since melioration learning is known to deviate from optimizing behavior when an action’s rewards decrease with increa...

متن کامل

Traditional Chinese Parsing Evaluation at SIGHAN Bake-offs 2012

This paper presents the overview of traditional Chinese parsing task at SIGHAN Bake-offs 2012. On behalf of task organizers, we describe all aspects of the task for traditional Chinese parsing, i.e., task description, data preparation, performance metrics, and evaluation results. We summarize the performance results of all participant teams in this evaluation, in the hope to encourage more futu...

متن کامل

Ontogenetic and Phylogenetic Reinforcement Learning

Reinforcement learning (RL) problems come in many flavours, as do the algorithms for solving them. It is currently not clear which of the commonly used RL benchmarks best measure an algorithm’s capacity for solving real-world problems. Similarly, it is not clear which types of RL algorithms are best suited to solve which kinds of RL problems. Here we present some dimensions along the axes o whi...

متن کامل

DeepMind Control Suite

The DeepMind Control Suite is a set of continuous control tasks with a standardised structure and interpretable rewards, intended to serve as performance benchmarks for reinforcement learning agents. The tasks are written in Python and powered by the MuJoCo physics engine, making them easy to use and modify. We include benchmarks for several learning algorithms. The Control Suite is publicly av...

متن کامل

Exploratory Gradient Boosting for Reinforcement Learning in Complex Domains

High-dimensional observations and complex realworld dynamics present major challenges in reinforcement learning for both function approximation and exploration. We address both of these challenges with two complementary techniques: First, we develop a gradient-boosting style, nonparametric function approximator for learning on Q-function residuals. And second, we propose an exploration strategy...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005